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Method and system for determining absolute mRNA quantities 




The invention is directed to a method and a system for determining absolute mRNA 
quantities by means of spotted cDNA microarray s. 

The use of DNA microarrays has revolutionized the investigation of gene expression due 
to its ability to measure mRNA quantities for thousands of different transcripts at the 
same time^'**. Briefly, representative single stranded DNA fragmen't' are inmiobiltzed on 
a solid support (e.g. a glass sUde) to probe for complementary cDnas or cRNAs. These 
are prepared from biological samples as a representation of their mRNA pool. The 
preparation involves reverse transcription, and - in the case of cRNA — additional in- 
vitro-transcription as a means for amplification. Complementary DNAs or cRNAs are 
concomitantly labeled by incorporation of radioactive isotopes or of nucleotides that have 
been modified by attaching a fluorescent dye. After hybridization to die microarray, 
weakly or unspecific bound cDNAs or cRNAs are washed away, and the signal is read 
out by a phosphorimager (for radioactive label) or a confocal laser scanning device (for 
fluorescent label). Computational image analysis determines for each feature (called 
^spof) on the array its position (and thereby its identity) and the signal intensity, which 
is believed to be linearly related to tiie original amount of cDNA or cRNA of that species 
diat is represented by this feature. 

With DNA microarrays, it has become possible to monitor changes in mRNA levels for 
thousands of different mRNAs in a single experiment The technique of "spotted" cDNA 



chips has found widespread me, but allows currently to measure only relative changes of 
mKNA concentration in a ceU. Moreover, due to technical limitations in the 
manufacturing process, results can only be made comparable for chips that stem from the 
same production series. i.e. experiments are currently Hmited to approximately 50-200 
hybridizations. 

There are stiU a number of technical problems that have to be corrected for in later data 
analysis. The most prominent problem is normalization, or standardization, which 
accounts for differences in labeling efficiency and mKNA amount between two samples 
whose cDNA representation is hybridized to two different chips, or to the same chip but 
labeled with fluorescent tags that light up m different colors when excited by UV laser 
light, and can thus be distinguidied*^'^. 

The object underlying the present invention is to provide an improved method and system 
for determining absolute mRNA quantities. 

This object is achieved with the subject-matter as recited in the claims. 
The invention provides a method to O / rcome fliese technical Umitations and measure 
absolute mRNA quantities, which ai-. v-.-.n comparable independent of the chip series 
used. The method involves a dilution series of control spots on the microarray and 
hybridization with a corresponding control cDNA of known concentration. According to 
a preferred embodiment a model curve is then fitted to the obtained signab for flie 
control, and can be used to calculate absolute mRNA concentrations in the sample used. 
Since these measurements are done with cDNA microarrays, they are able to incorporate 
as many as 25,000 different mRNA species in a single hybridization experiment. 
Another problem Aat could not be solved yet is that chips that have been produced by ^ 
"spotting" display high batch-to-batch variation m the surface concentration of DNA 
fragments m individual spots. In this method, cDNA templates for the genes that are to be 
probed for are first anq)lified by large-scale multiplex polymerase chain reaction (PGR). 
The amplified fragments are than transferred to the microarray supports (mostly glass 
chips with chemically modified surfeces) by means of roboting devices, which are able to 
deliver nanoUter quantities with a spatial precision of better than 100 |im. Recently, ink 
jet-like printing devices have been used for the same purpose^. In any case, once the 



-2- 




inxxluced PGR fragments are exhausted, a new PGR preparation has to be started from 
Ae masta teiiq)lates. It is currently impossible to control fee yield of die PGR reaction 
for thousands of different tenq>lates at the same time. Thus, spotting microarrays from a 
different PGR batch results in chips that vary in individual spot DNA concentration by 
&ctors of tq) to 100 compared with a previous microarray production series. The 
influence of the DNA spot concentration on recorded signal intensity after hybridization 
is immediately obvious, and cannot be overcome by competitive hybridization where 
only a ratio of transcript levels is measured for two samples, one whose cDNA 
representation serves as internal reference, and one whose mRNA pool is to be 
investigated. The incomparability has been conclusively demonstrated by Yue and 
coworkers^, who showed that this ratio is considerably dependent on the DNA spot 
concentration, and will be the more "compressed" (nearer to 1) the smaUer the DNA spot 
concentration is. 

Here, we present a novel method that allows a subsequ^t conq>utational correction for 
the mfluence of the individual DNA spot concentratioa It requires one (or several) 
dUution serifts of control cDNAs to be present on the array, a hybridization with a 
universal , :f. that binds to every spot on the array, and addition of complementary 
controls to the mRNA or cDNA preparation in defined concentrations. By fitting a model 
curve to die data obtained from the hybridization vrifh the universal target, a subset of 
spots can be identified whose corre:^on(fing vahies need correction, whereas for the 
remaining spots correction would do more harm than benefit. The method extends current 
microarray ^plications in two directions. First, it is now possible to measure absolute 
quantities of different mRNAs by microarrays. Second, and more importantly, it is now 
possible to compare experiments done on different series of chips, and moreover also 
between different transcript level measuring systems like cDNA chips, oligonucleotide 
chips'"' SAGE (serial analysis of gene expression)^^'^', and multiplex real-time reverse 
transcription (RT) PCR^'- This extends the "horizon of comparability", which is 
currently limited to about 50-200 chips, to the levels that are required for new 
applications diat require investigation of up to 500 8anq>les, e.g. classification of cancer 
subtypes by means of gene expression profiling. 
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Throughout this text, the following terms wiU be used: 

- array, microarray. chip: a solid support (e.g. chemically modified glass or 
nylon/polypropylene membrane) that contains a number of features (spots), each 
containing a single species of DNA fiagments 

- feature, spot: a single location on a microarray where only one species of DNA 
fragments is bound 

- hybridization: the process by which two complementary DNA molecules bind to 
each other, being driven by Watson-Crick base pairing 

- probe: one of the array-bound DNA fragments that are meant to probe for a 
specific transcript 

- (solute) target: the cDNA or cRNA preparation from a biological sample, 
intended to be the representation of the sample's mRNA pool 

- signal intensity: the intensity of radioactive radiation or the fluorescence intensity 
during laser excitation that has been read out for a single spot 

. (biological) sample: a specunen from the organism under study, derived under 
defined conditions; e.g., a sample from yeast culture, cell culture, a tumor 
specimen, a biopsy sample, a blood sample 

. unrvW target: a DNA fragment of defmed sequence that binds to every spot on 
an array. e.g. a universal oligonucleotide 

~ The figures relate pr^fer^ed embodiments which ex^plify-lEhe presehfinvention and 
show the following: 

Figure 1: Signal intensities in dependence of DNA spot concentrations and of solute 
target concentrations for nylon membranes. Data shown are averaged over four 
independent measurements. Data for 4 representative probe species are shown. 
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Figure 2: Signal intensities in dependence of DNA spot concentrations and of solute 
target concentrations fiw glass slides. Data shown are averaged over four independent 
measurmients. Data for 4 representative probe species are shown. 

Figure 3: Data (blue) and fitted model curve (red) for a representative probe fiagment on 
both nylon membrane (a) and a glass slide (b). 



Computational Procedures 

The values obtained from the hybridization to a universal target, i.e. the read-out signal 
intensities from flje dilution series spots, are taken as a basis for calculating the 
parameters of a model function by means of non-linear least-squares fitting. As model 
fimction, we use tfie logistic fimction: 

/ = 2 

K-^I^ip''' -1) 

where / refers to the modeled signal intensity, Cp refers to the pr v> DNA spot) 
concentration, K represents the asymptotic signal intensity fi)r ->*»<>, /o is Ae 
asymptotic signal intensity for ->.0, and r is a shape parameter. We could 
demonstrate (see Results) that r does neither depend on the concentration of the solute 
target cDNA nor on the nature of the probe sequence, and hence should be the same for 
eveiy spot on the array. Fitting is done by standard gradient optimization procedures for 
non-linear fitting, we used the Newton-Raphson method as implemented in MATLAB'*'*' 
(MathWorks Inc., Natick, MA, U.S. A.). The confidence of the fitting procedure is greatiy 
enhanced by using several repUcates of the dilution series on the chip, we recommend to 
use at least five. 

The model fimction is used to determine the set of spots v^ose values need correction for 
the mfluence of spot DNA concentration. A critical probe concentration is defined by 

Ccrit=c»|/ = 0.85i«: (2) 
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or, analytically 

(3) 



crit 



^ lLl7(^-/o)" 
rl 3/0 . 



A hybridization with a universal target vdU provide for an array of a given series both the 
possftiUty to calculate the parameters of the model function, based on the values from the 
dilution series, and to detemiine for every spot in the array whether or not its spot DNA 
concentration is below or above the critical probe concentration. If it is above, no 
correction is needed since the probe concentration does not influence signal intensity. If it 
is below, the signal intensity is, m first approximation, linearly dependent on the spot 
DNA concentration, and values in later hybridizations can be corrected by takmg the 
determined probe concentration from the hybridization with the universal target 

Experiments on nylon and polypropylene membranes 

As probe sequences, 16 different PGR frapnents from yeast ORFs (Tab. 3) have been 
spotted m 9 different concentration. - ' • D. on both nylon and polypropylene 
membranes. These arrays have been hybridized with a universal oUgonucleotide m 8 
different concentrations (Tab. 2). Hybridization experiments have been carried out m 
quadrupUcate. and signal intensities subsequently averaged. 



Table 1: spot DNA amounts used 



amoiint of immobilized DNA 




10 amol 


50 amol j 100 amol 


500 amol 


Ifinol 


5fiaiol — 


-lOfinol jZMmoi 


50finol 
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Table 2: target concentrations tested 

concentration of target (radioacrively labeled universal oligonucleotide) 



concentration of target (radioactively labeled universal oligonucleotide | 
2pM |8pM |20pM |40pM |80pM | 120pM | 160pM |200pM | 



Table 3: probe sequences used 



ORF identifier 


length [bp] 


YAR030C 


342 


YAR043C 


350 


YCR025C 


411 


YCL022C 


516 


YAL055W 


543 


YAR037W 


600 


YCL031C 


894 


YCR034W 


1044 


YCR045C 


1476 


YCR024C 


1479 


YCR048W 


1833 


YBR280C 


1929 


YDR422C 


2592 


YCR030C 


2613 


YDR430C 


2790 


YAL035W 


3009 



As is obvious ffom Fig. 1, the measuied signal intensities depend in a non-linear feshion 
on both fbs DNA spot concentration and on the concentration of the solute taiget The 
dependence on DNA spot concentration is best described by a q)proximately linear 
interval at low spot concentrations that is followed by a saturation region where no 
increase in signal intensity can be seen if flie spot concentration is increased. 

There are no differences in principal shape to the other probe species and to data fiom 
polypropylene membranes (data not shown). 



Experiments on glass sUdes Poly-L-lysine-coated glass slides have been used as solid 
support The 16 PGR fragments mentioned above (Tab. 3) have been spotted in 
quadruplicate onto these slides in 8 different concentrations (Tab. 4). The arrays have 
been hybridized with a fluorescently marked universal oligonucleotide in 5 different 
concentrations (Tab. 5) 
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Table 4: amount of spotDNA 



■■r-r'rffSf^f... 1 ...^ 1 5..^ i.o«-^. \^"^ 



Table 5: solute target concentrations used 



; y^^"""'"T20TSM l^nM I^OnM |200nM | 

Data are shown in Fig. 2, and represent averages over the 4 spots on a single slide and the 
4 independent tepUcate hybridizations that have been performed. Apart from a coarser 
resolution with regard to high DNA spot concentrations and a higher variability of the 
data, no principal difference is obvious compared to the experiments on nylon or 
polypropylene membranes. 

Fitting of model curve to data Equation (1) has been fitted to the data by means of 
non-linear optimization. Of main interest is here tiie estimate for the shape parameter r. 
This parameter describes tiie steepness of tiie curve and hence determines the critical spot 
concentration which is used to split tiie range of DNA spot concentmtions in a non- 
influential section at high spot concentrations, and in a range where the signal depends on 
the spot concentration. The results are shown for representative probe fragments botii for 
nylon membranes and for glass slides (Fig. 3). The model function fits the data well for 
the transition from influential to non-influential region, while deviating considerably in 
the region with low spot concentration. However, with regard to ihe intended purpose, a 
good fit in tins region is not needed- 
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Claims 



1. Method for detennining absolute mRNA quantities by means of cDNA microairays 
with the following steps: 

a. providing a microarray ; 

b. providing at least one or more dilution series of control spots on the microarray; 

c. hybridizing with a corresponding control cDNA of known concentration; 

d. deriving reference data from the hybridization and 

e. using said reference data to calculate absolute mRNA concentrations in one or 
more samples used 

2. Method according to claim 1, wherein the reference data comprise a model curve 
which is fitted or adapted to the obtained signals for control. 

3. Method according to any one of the preceding claims, wherein cDNA templates for 
the genes that are to be probed for are first amplified by large-scale :j;/:'jt^lex 
polymerase chain reaction (PGR) to obtain amplified fragments. 

4. Method according to claim 3, wherein said amplified fiagments are dien transferred 
to the microarray by means of roboting devices which are able to deliver nanoliter 
quantities with a spatial'precision of better than 100 ]im. 

5. Method according to any one of the preceding claims, wherein the reference data 
obtained from the hybridization, e.g., the read-out signal intensities from the dilution 
•series spots, are taken as a basis for calculating the parameters of a model function by 
means of non-linear least-squares fitting. 

6. Method according to claim 5, wherein for the model function the following function 
is used: 
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K^hip"' -1) 

where / refers to the modeled signal intensity, Cp refers to the probe (or DNA spot) 
concentration, K represents the asymptotic signal intensity for oo , /„ is the 

asymptotic signal intensity for 0 , and r is a shape parameter. 

7. Method accordmg to any one of claims 2-6, wherein fitting of the reference data is 
done by gradient optimization procedures. 

8. Method according to claim 7, wherein for non-linear fitting, the Newton-Raphson 
meAod is used. 

9. Method according to any one of the precec^- - , ' : ims, wherein a critical probe 
fimction is used to determine the set of spots : values need correction for the 
infiuence of spot DNA concentration, the critical probe concentration function being 
defined by 



10. Computer-program product comprising program code- means stored, on a computer 
readable medium for performing the computable part of the method of at least one of 
the preceding claims whai said program product is run on a computer. 

11. System, particularly for performing the method of at least one of claims 1-15, 
comprising: 

a. a microarray containing at least one or more dilution series of control spots; 



-13- 



b. means for hybridizing with a corresponding control cDNA of known 
concentration; 

c. means for deriving reference data from said hybridization and 

d. means for making use of said reference data for calculating absolute mRNA 
concentrations in one or more samples used. 

12. Use of a method according to at least one of claims 1-9, a computer program product 
according to claim 10 and/or a system according to at least one of claims 1 1 or 12 for 
determining absolute mRNA quantities on cDNA microanays. 
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Abstract 

Method and system for detennining absolute mRNA quantities 

The present invention is particularly directed to a method and a system for determining 
absolute mRNA quantities, specifically on cDNA microarrays. The following steps and 
respective means are provided: providing a microarray; providing at least one or more 
dilution series of control spots on this microarray; hybridizmg with a corresponding 
control cDNA of known concentration; deriving reference data firom the hybridization 
and using said reference data to calculate absolute mRNA concentrations in one or more 
samples used. 
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